SystemT: A Declarative Information Extraction System
نویسندگان
چکیده
Emerging text-intensive enterprise applications such as social analytics and semantic search pose new challenges of scalability and usability to Information Extraction (IE) systems. This paper presents SystemT, a declarative IE system that addresses these challenges and has been deployed in a wide range of enterprise applications. SystemT facilitates the development of high quality complex annotators by providing a highly expressive language and an advanced development environment. It also includes a cost-based optimizer and a high-performance, flexible runtime with minimummemory footprint. We present SystemT as a useful resource that is freely available, and as an opportunity to promote research in building scalable and usable IE systems.
منابع مشابه
SystemT: An Algebraic Approach to Declarative Information Extraction
As information extraction (IE) becomes more central to enterprise applications, rule-based IE engines have become increasingly important. In this paper, we describe SystemT, a rule-based IE system whose basic design removes the expressivity and performance limitations of current systems based on cascading grammars. SystemT uses a declarative rule language, AQL, and an optimizer that generates h...
متن کاملTowards a Scalable Enterprise Content Analytics Platform
With the tremendous growth in the volume of semi-structured and unstructured content within enterprises (e.g., email archives, customer support databases, etc.), there is increasing interest in harnessing this content to power search and business intelligence applications. Traditional enterprise infrastruture or analytics is geared towards analytics on structured data (in support of OLAP-driven...
متن کاملAutomatic Rule Refinement for Information Extraction
Rule-based information extraction from text is increasingly being used to populate databases and to support structured queries on unstructured text. Specification of suitable information extraction rules requires considerable skill and standard practice is to refine rules iteratively, with substantial effort. In this paper, we show that techniques developed in the context of data provenance, to...
متن کاملRepairing Regular Expressions by Adding Missing Words
Regular expressions are used in many information extraction systems like YAGO, DBpedia, Gate and SystemT. However, they sometimes do not match what their creator wanted to find. We investigate how missing words can be added automatically to a regular expression by creating disjunctions at the appropriate positions. Our demo visualizes the steps that our algorithm employs to repair the regular e...
متن کاملThe Power of Declarative Languages: From Information Extraction to Machine Learning
As advanced analytics has become more mainstream in enterprises, usability and system-managed performance optimizations are critical for its wide adoption. As a result, there is an active interest in the design of declarative languages in several analytics areas. In this talk I will describe the efforts in IBM around three areas namely Information Extraction, Entity Resolution and Machine Learn...
متن کامل